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i-Q ■ Abstract 
<D ■ 

In this paper, we will introduce an exact algorithm with a time complexity of 0*(1.299™)t for 
^T) . the WEIGHTED MUTUALLY EXCLUSIVE SET COVER problem, where m is the number of subsets in the 

CN| ' problem. This problem has important applications in recognizing mutation genes that cause different 

cancer diseases. 

q 
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1 Introduction 



The SET COVER problem is that: given a ground set X of n elements and a collection J- oi m subsets 
of X, try to find a minimum number of subsets 5i, 5*2, . . . ,Sh in J" such that U^^j^S'j = X. If we add 
] an additional constrain such that all subsets in the solution are pairwise disjoint, then the set cover 
problem becomes the MUTUALLY EXCLUSIVE set cover problem. If we further assign each subset in 
i: J- a real number weight and search the solution with the minimum weight, i.e. the sum of weights of 
] subsets in the solution is minimized, then the problem becomes the weighted mutually exclusive 
' SET COVER problem. 

O ■ Recently, the weighted mutually exclusive set cover problem has found important applica- 

! tions in cancer study to identify driver mutations [11[12], i.e. somatic mutations that cause cancers. As 
• • ' somatic mutations will change the structures (and therefore the functions) of signaling proteins; thus, 
. ^ ■ perturb cancer pathways that regulate the expressions of genes in certain important biological processes, 
^ , such as cell death, cell proliferation etc. The perturbations within a common cancer pathway are often 
I found to be mutually exclusive in a single cancer cell, i.e. each tumor usually has only one perturbation 
on one given cancer pathways (one perturbation is enough to cause the disease; hence, there is no need 
to wait for another perturbation). Modern lab techniques can identify somatic mutations and gene ex- 
pressions of cancer cells. After preprocessing the data, we will obtain following information for important 
biological processes, e.g. cell death: l)which cancer cells have disturbed the expressions of genes in the 
biological process; 2) which genes have been mutated in those cancer cells; 3) how possible each mutation 
is related to the given biological process (i.e. each mutation is assigned a real number weight). Then 
next step is finding a set of mutations such that each cancer cell has one and only one mutation in the 
solution set (mutually exclusive) and the sum of weights of all genes in the solution set is minimized, 
which is the WEIGHTED mutually exclusive set cover problem. 

While there is not much research on the mutually exclusive set cover or the weighted mutu- 
ally EXCLUSIVE SET COVER problems, the set cover problem has been paid much attention. The set 
COVER, which is equivalent to the hitting set problem, is a fundamental NP-hard problem in Karp's 



^Note: Following the recent convention, we use a star * to represent that the polynomial part of the time complexity 
is neglected. 
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21 NP-complete problems [5]. One research direction for the set cover problem is approximation algo- 
rithms, e.g. papers [H El [9l [11] gave polynomial time approximation algorithms that find solutions whose 
sizes are at most clogn times the size of the optimal solution, where c is a constant. Second direction is 
using the number of subsets in the solution, as parameter to design fixed-parameter tractable (FPT) 
algorithms for the equivalent problem, the HITTING SET problem. Those algorithms have a constrain 
such that each element in X is included in at most d subsets in J-", i.e. sizes of all subsets in the hittng 
SET problem are upper bound by d; it is also called the d-HlTTiNG set problem. For example, paper |13j 
gave an O* (2.270'^) algorithm for the 3-hitting set problem, and paper further improved the time 
complexity to 0*(2.179'^). The third direction is designing algorithms that use n as parameter in the 
condition that n is much less than m. Papers [21 [7] designed algorithms with time complexities of 0*{2^) 
for the problem. The paper [2] also extended the algorithm to solve the weighted mutually exclu- 
sive SET COVER problem with the same time complexity. Paper |10j improved the time complexity to 

O* (2 i+i°s2 d. ■) under the condition that at least -, . „ elements in X are included in at most d subsets 
in F. This algorithm can also be extended to the weighted mutually exclusive set cover problem 
with the same time complexity. However, in the application of cancer study, neither n is less than m nor 
each element in x is included in bounded number of subsets in J-. Hence, there is a need to design new 
algorithms. 

In this paper, we will design a new algorithm that uses m as parameter (in application of cancer study, 
m is smaller than n, where n can be as large as several hundreds). Trivially, if using m as parameter, 
we can solve the problem in time of 0*(2™'), where the algorithm basically just tests every combination 
of subsets in J-". To our best knowledge, we have not found any algorithm that is better than the trivial 
algorithms when using m as parameter. This paper will give the first un-trivial algorithm with the time 
complexity of 0*(1.299'") to solve the weighted mutually exclusive set cover problem. We have 
tested this algorithm in the cancer study, and the program can finish the computation practically when 
m is less than 100. 

2 The WEIGHTED MUTUALLY EXCLUSIVE SET COVER problem is NP-hard 

The formal definition of the WEIGHTED mutually exclusive set cover problem is: given a ground 
set X of n elements, a collection T oi m subsets of X, and a weight function w : J- ^ [0,oo), if 
F' = {Si, 5*2, . . . , Sh} C T such that U^^iSi = X, and Si PI = for any i ^ j, then we say F' is a 
mutually exclusive set cover of X and X^j^^ w{Si) is the weight of F'; the goal of the problem is to find 
a mutually exclusive set cover of X with the minimum weight, or report that no such solution exists. 

As we have not found the proof of NP-hardness for the weighted mutually exclusive set cover 
problem, in this section, we wih prove that the mutually exclusive set cover problem is NP-hard; 
thus, prove that the weighted mutually exclusive set cover problem is NP-hard. 

We will prove the NP-hardness of the mutually exclusive set cover problem by reducing another 
NP-hard problem, the MAXIMUM SET PACKING problem, to it. Remember that the maximum set 
packing problem is: given a collection J-" of subsets, try to find an S C J- such that subsets in S are 
pairwise disjoint and |5| is maximized. 

Theorem 2.1 The mutually exclusive set COVER problem is NP-hard. 

Proof. Let S = {Si, S2, ■ ■ ■ , Sm} be an instance of the maximum set packing problem, where 
X' = U™ = {xi,X2, ■ ■ ■ ,Xn}- We create an instance of the mutually exclusive set cover problem 
such that: 

• X = X' U {Ti,T2, . . .,Tm}, where Tj = {tji,ii2, • • for all 1 < -i < m; 

• T = T'UT"U T'", where T' = {{xi}, {xa}, . . . , {xn}}, T" = {Si U Ti, S2UT2, . . . , Sm^ T^}, and 
7"'" = U^i{{tii}, {ti2}, . . . , 
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Next, we will prove that if V = {Pi,P2, . . . , Pk} is a solution of the mutually exclusive set cover 
problem, then S' = {S'l, S2, . . . , 5'^/} is a solution of the maximum set packing problem, where VnT" = 
{S[ U T{, 52 U Tg, . . . , 5^, U T^j}. Thus we will prove that the time to solve the maximum set packing 
problem is bounded by the total time of transforming the maximum set packing problem into the 
mutually exclusive set cover, and of solving the mutually exclusive set cover problem. 
Therefore, the mutually exclusive set cover problem is NP-hard. 

As subsets in V are pairwise disjoint, it is obvious that subsets in S' are pairwise disjoint. Hence, 
if we suppose that S' is not the solution of the MAXIMUM set packing problem, then there must 
exists a S" = {S", S2, . . . , S^',} C S such that subsets in S" are pairwise disjoint and k' > k. Thus we 
can make a new solution V' of the mutually exclusive set cover problem such that V' includes 
{SI U T[', S'i U T^', . . . , S'l, U T^',} C T" and other subsets in and F'" . If let \X' - U^L^^^j = ni and 
\X' - yj\US'l\ = na (Note: any Tj, which is not covered by a subset in J"", needs n + 1 subsets in F'" to 
cover it; any Xi G X' , which is not covered by a subset in F" , needs a subset in F' to cover it), then 

\'P\ = k+{m- k){n + 1) + ni, 

and 

\V'\ = k' +{m- k'){n + 1) + n2. 

Therefore \V\ — \V'\ = {k' — k)n + ni — 722 > 0, i.e. V' is a solution with less subsets in F, which cases 
contradiction that V is the solution of the MUTUALLY EXCLUSIVE SET COVER problem. Hence, S' is a 
solution of the MAXIMUM set packing problem. □ 

3 The main Algorithm 

In this section, we will introduce our new algorithm to solve the weighted mutually exclusive set 
COVER problem. 

Let {X, 7", w) be an instance of the weighted mutually exclusive set cover problem. We can 
use a bipartite graph to represent (X, F, w) such that all nodes on one sides are subsets in F while nodes 
on the other side are elements in X, and if an element u of X is in subset U, i.e. u (z U, then an edge is 
added between u and U. For the convenience, let us introduce some notations. The Figure [T] can help 
you to understand and remember following notations. 




Figure 1: Graph representation and some notations of the problem 
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For any x £ X, let neighbor{x) = {S\S € T and x € S}, degree{x) = \neighbor{x)\, partner{x) = 
^S&neighbor{x)S ■ For any y in partner{x), let neighborin = neighborly) fl neighbor{x), degreein{y) = 
\neighborin{y)\, neighborout = neighbor {y) - neighbor{x), degreeoutiv) = {neighbor out{y)\- 

Algorithm-l WMES-Cover((X, T, w), Solutioupartiai, Solution final)) 

Input: An instance of tlie weighted mutually exclusive set cover problem, two variables, 

where Solution final is a global variable to keep the best solution. 
Output: A minimum weight mutually exclusive set cover or "No Solution". 

1 if X == then 

1.1 it weight(Solutionjjartiai) < weight(Solution final) then replace Solution final with Solutionpatiai', 

2 Find x £ X such that d = degree{x) is minimized; 

3 if c? == then return "No Solution"; 

4 if d == 1 then WMES-Cover((X — {x-}, F — neighbor (x) , w) , Solutionpartiai U neighnor{x), Solution final)', 

5 if degreeoutiv) ~~ for all y G partner{x) then 

5.1 if there exists S G neighbor [x) such that S == partner [x) then 

5.1.1 WMES-Cover((X — S,J- — neighbor {x),w), Solutionpartiai U {S}, Solution final)', 
else 

5.1.2 return "No Solution" ; 

6 if d == 2 then / / Suppose neighbor{x) = {Si, S2}', note that Si C X and S2 C X. 

6.1 WMES- Cover ((Jf — Si, T ~UuG3ineighbor{u),w), Solutionpartiai U {Si}, Solutionfinai)', 

6.2 WMES- Cover ((X — S2, ~ y^ueS2''T'^ighbor{u),w), Solutionpartiai U {S2}, Solutionfinai)', 
else // (Note; d > 2) 

6.3 if there exists a y £ partner{x) such that degreeoutiv) = 1 then 

6.3.1 Let V £ partnerix) such that degreeoutiv) = 1 and W' £ neighboroutiv)', 

6.3.2 if \neighborix) — neighboriv) \ > then // (Note: \neighborix) — neighboriv)\ < 1) 

6.3.2.1 Find any W G neighbor ix) — neighboriv); 

6.3.2.2 WMES-Cover((X -W' UW,T - Uusw'uwneighboriu),w), Solutionpartiai U {W , W}, Solutionfinai)', 

6.3.2.3 WMES-Cover((X, T - {W , W},w), Solutionpartiai, Solutionfinai)', 
else 

6.3.2.4 Find any W G neighborix); 

6.3.2.5 WMES-Cover((X — W,T — Uuewneighboriu),w), Solutionpartiai U {W}, Solutionfinai)', 

6.3.2.6 WMES-Cover((X, T - {W , W}, w), Solutionpartiai, Solutionfinai)', 
else 

6.3.3 Find a y G partnerix)) such that degreeoutiv) is maximized; 

6.3.4 Find a, Z £ neighbor iniv)', 

6.3.5 WMES-Cover((X — Z,T — Uuezneighboriu) , w), Solutionpartiai U {Z}, Solutionfinai)', 

6.3.6 WMES-Cover((X, T — {Z}, w), Solutionpartiai, Solutionfinai)', 



Figure 2: Algorithm for the WEIGHTED MUTUALLY EXCLUSIVE set cover problem. 

The main algorithm, Algorithm-l, is shown in Figure [2j Basically, the Algorithm-l first finds an 
X £ X with minimum degree and then branches at one subset in neighbor{x) (such as in step 6.2.2 and 
6.2.3). For the convenience, if degree{x) = d, then we say that Algorithm-l is doing a d-branch. Because 
of steps 3,4,5, when the program arrives at step 6, we must have: 1) d = degree{x) > 2; 2) for any u (z X, 
degree{u) > 3) there exists a y € partner{x) such that degreeoutiv) > 0. 

The Algorithm-l is basically searching the solution by going through a search tree; hence, if knowing 
the number of leaves in the search tree, then we will obtain the time complexity of the Algorithm-l. Next, 
we will estimate the number of leaves in the search tree by studying the different cases of branching. We 
begin from the 2-branch. 

Proposition 3.1 The search tree has at most 1.273™ leaves If only the 2-branches are applied in Algorithm- 
1. 

Proof. Suppose that degree{x) = 2 and y G partner{x) such that degreeoutiv) > 0. Let neighborix) = 
{81,82}. 
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In the case of degreeoutiu) = 1; let neighbor outiv) = {5"'}- In the branches of choosing either or 
S2 into the solution, if y is covered, then S" wih be removed from the J-", or else if y is not covered yet, 
then S" will be chosen into the solution in order to cover y (note: after Si, S2 are removed, degree{y) = 1 
in the new instance (at line 6.1.1 and 6.1.2 of Algorithm-1); thus, S" will be included into the solution 
in the next call of the Algorithm-1 in this branch). Hence, in any case, 3 subsets in will be removed. 
If letting T{k) be the number of leaves in the search tree when | = k, then we will obtain the following 
recurrence relation 

T{k) < 2T{k - 3). (1) 

The characteristic equationof this recurrence relation is — 2 = ^ ; hence, we will have T{m) < 1.260"*. 
In the case of degreeoutiu) > 1, we consider following sub-cases. 

Sub-case 1. Suppose degreein{y) = 1, and y € Si. Then at least Si and S2 will be removed from T for 
the branch of choosing 52 into the solution; at least ^i, ^2, and all subsets (at least two) in neighbor outiu) 
will be removed for the branch of choosing 5i into the solution. Thus the recurrence relation of T{k) is 

T(A;) <T(A;-2) + r(A;-4). (2) 

which leads to T{m) < 1.273™. 

Sub-case 2. Suppose degreein{y) = 2. Then in either branch, y is covered by Si or S2, which is chosen 
into the solution. Hence, Si,S2, and all subsets (at least two) in neighbor out{y) will be removed from F. 
Thus we will obtain the recurrence relation 

T{k) <2T{k-A). (3) 

which leads to T(m) < 1.190™. 

By considering all above cases, we obtain that T[m) < 1.273™. □ 

Now, we consider the case of doing 3-branch. Remember that when Algorithm-1 is doing a 3-branch, 
degree{x) > 3 for all x X. 

Proposition 3.2 The search tree has at most 1.299™ leaves If only the d-branches for d <= 3 are applied 
in Algorithm-1. 

Proof. The cases of 2-branches are considered in the last proposition. Now we consider the cases 
of 3-branches. Suppose that degree{x) = 3 and y G partner (x) such that degreeoutiu) > 0. Let 
neighbor{x) = {Si, S2, S3}. 

If degreeoutiu) = 1; then degrecmiy) > 2 (as degree{y) > 3). Let {S'} = neighbor out{u)- We further 
consider following sub-cases. 

Sub-case 1. Suppose degreciniu) = 2. Let Si G neighbor (x) — neighborly). The Algorithm-1 branches 
at Si. The branch one includes Si into the solution; thus, S2, S^ will be removed. This will further make 
degree{y) = 1. Hence, S' will also be included into the solution. Totally, in this branch, we will remove 
at least 4 subsets from T. In branch two, we will exclude Si from the solution. Then either 5*2 or S3 must 
be included into the solution. Thus y is covered by 5*2 or S3, and S' will not be in the solution. Therefore, 
in this branch, we know that at least Si and 5' will be removed. So we will obtain the recurrence relation 

T{k) <T{k-2)+T{k-4), (4) 

which leads to T(m) < 1.273™. 

*Note: Given a recurrence relation T{k) < Yli=o '^'^{i) such that all d are nonnegative real numbers, 'YlllZo and 
r(0) represents the leaves, then T{k) < r*, where r is the unique positive root of the characteristic equation t'' — YliZo ~ 
deduced from the recurrence relation [3]. 
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Sub-case 2. Suppose degreein{y) = 3. Then 5' will not in the solution and any one of Si, S3, S^ (one 
and only one of them must be included into the solution to cover x) will cover y. The Algorithm-1 will 
branch at any one of 5*1, ^2, 53. Without loss of generality, we branch at 5i. In the branch of including Si 
into the solution, Si, 52, 5*3 will be removed, which will totally remove at least 4 subsets. In the branch 
of excluding Si into the solution, will be removed. Thus 2 subsets will be removed. We will obtain 
the following recurrence relation 

r(A:) < r(A;-2) +r(A:-4), (5) 

which leads to T{m) < 1.273"^. 

In the case of degreeoutiv) > Ij Let Si G neighhorin{y). Algorithm-1 branches at ^i. In the first 
branch. Si is included into the solution. Then Si, 52, 53 and at least 2 subsets in neighbor out{y) will 
be removed. In the second branch. Si is excluded, which will make degree{x) = 2 in the new instance; 
hence, in this branch, a 2-branch will follow. Thus even considering the worst case of the 2-branch (the 
recurrence relation (2)), we will have 

T(A;) <2T(A;-5)+r(A:-3), (6) 

which win lead to T{m) < 1.299™. 

From all above cases and Proposition 13.11 we will have T{m) < 1.299™". □ 

Let us consider the case of doing d-branch for d > 3. 

Proposition 3.3 The search tree in Algorithm-1 has at most 1.299™ leaves. 

Proof. We only need to consider the cases of d-branches for d > 2>. Suppose that degree{x) = d and 
y G partner{x) such that degrecoutiy) > 0. Let neighbor{x) = {Si, S2, ■ ■ ■ , Sd}- 
In the case of degrecoutiy) = 1, degreein{y) can only he d — 1 ov d. 

Sub-case 1. Suppose degreein{y) = d — 1. Then there is one and only one subset in neighbor{x) — 
neighborin{y). Without loss of generality, we suppose Si neighbor in{y). Algorithm-1 will branch on 
Si such that in the branch of including Si into the solution, all d subsets in neighbor{x) and one subset 
in neighbor out{y) will be removed (i.e. in this branch, at least 5 subsets will be removed; in the branch 
of excluding 5i from the solution, one subset in {52, 53, ... , 5^} will be included into the solution, which 
y will be covered and the only subset in neighbor out{y) will be removed (i.e. in this branch, two subsets 
will be removed). Therefore, we will have following recurrence relation 

T{k) <T{k-h)+T{k-2), (7) 

which leads to T{m) < 1.237™. 

Sub-case 2. Suppose degreciniy) = d. Without loss of generality, we suppose that Algorithm-1 
branches on 5i. Then it is easy to understand the we will have the following recurrence relation 

T{k) <T{k-5) + T{k-2), (8) 

which leads to T(m) < 1.237™. 

In the case of degreeoutiy) > 1> suppose 5i £ degreein{y) and Algorithm-1 branches on 5i. Then 
in the branch of including 5i into the solution, all subsets in neighbor (x) and neighbor out{y) will be 
removed (at least 6 subsets will be removed). In the branch of excluding 5i into the solution, at least 
one subset 5i will be removed. Hence, we will have the recurrence relation 

r(A:) <T(A;-6)+r(A:-l), (9) 

which leads to T{m) < 1.286™. 

Considering all above cases, Proposition l3.lt and Proposition 13. 2|, we have T{m) < 1.299™. □ 
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Theorem 3.4 The weighted mutually exclusive set cover problem can be solved by an algorithm 
with a time complexity of 0*{1.299^). 

Proof. Let {T,X,w) be an instance of the weighted mutually exclusive set cover problem, 
where X is a ground set of n elements, J-" is a collection of m subsets of X, and w : J- [0, oo) is the 
weight function. Now we prove that the problem can be solved by the Algorithm-1 in time 0*(1.299™). 

The correctness of the algorithm is easy to understand. If there is an x € X such that degree{x) = 0, 
then X cannot be covered by any subset in T. Thus, the problem has no solution. The step 3 of the 
Algorithm-1 deals with this situation. If, for any given x G X, degree{x) = 1, then there exists one and 
only one subset in T that covers x, i.e. neighbor{x) must be included into the solution. Thus x and 
neighbor (x) will be removed from the problem. This situation is dealt with in step 4. If for all y in 
partner{x), degreeoutiv) = 0, then partner{x) can only be covered by subset(s) in neighbor (x). By the 
exclusivity, at most one subset in neighbor[x) can be chosen into the solution. Thus, if finding a subset 
S in neighbor [x) such that S = partner [x), then Algoirhtm-1 will include S into the solution, or else 
the problem has no solution. The step 5 of the Algorithm-1 deals with this situation. 

After the Algorithm-1 reaches step 6, we have: 1) for all x' € X, degree{x') > degree[x) > 1 (as x 
is the element in X with the minimum degree); 2) there is a y G partner [x) such that degrecoutiy) > 0. 
li d = neighbor{x) = 2, then one and only one subset in neighbor{x) will be in the solution. The step 
6.1 and 6.2 correctly deals with this situation. For the cases after step 6.2, the Algorithm-1 basically 
chooses one subset S in neighbor [x) and branches on S such that one branch includes S into the solution 
and the other branch excludes S from the solution (Note: when degreeoutiu) = 1, we used a small trick 
to include or exclude the additional subset in neighbor our{y) into or from the solution; please refer to 
sub-case 1 and sub-case 2 in the Proposition 13. 3p . Therefore, Algorithm-1 will go through the search tree 
and find the solution with the minimum weight (if the solution exists), which is saved in step 1.1. 

By Proposition 13. 3|, the search tree has at most 1.299"^ leaves. Hence, the time complexity of the 
algorithm is bounded by 0*(1.299"^). If we further notice that the time to process each node is bounded 
by 0{mn), then the more accurate time complexity of the algorithm is 0(1.299™mn). □ 

4 Problem extension 

In this paper, we first proved that the weighted mutually EXCLUSIVE SET COVER problem is NP-hard. 
Then we designed the first non-trivial algorithm, which uses the m as parameter, with a time complexity 
of O* (1.352™) for the problem, the weighted mutually exclusive set COVER problem has been used 
to find the driver mutations in cancers [U [12] . Our new algorithm can find the optimal solution for the 
problem, which is better than solutions found by the heuristic algorithms in the previous research [Hll2j. 
The exclusivity is the extreme case. In practical applications, a cancer cell may have more than one 
mutation to perturb a common pathway. Hence, a modified model is finding a set of mutations with 
minimum weight sum such that each cancer cell has at least one and at most t (t=2 or 3) mutations in 
the solutions, which leads to the small overlapped set cover problem. Also, on application, some 
mutations in cancer cells may not be detected because of errors. Thus, it is not always ideal to find a 
solution mutations that cover all cancer cells. A modified model is finding a set of mutually exclusive 
mutations that cover at least r percent (90% or 95%) of cancer cells, which leads to the maximal set 
COVER problem. Our next research will design efficient algorithms for above two new problems. 
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